Relational Algebra for In-Database Process Mining

نویسندگان

  • Remco M. Dijkman
  • Juntao Gao
  • Paul W. P. J. Grefen
  • Arthur H. M. ter Hofstede
چکیده

The execution logs that are used for process mining in practice are often obtained by querying an operational database and storing the result in a flat file. Consequently, the data processing power of the database system cannot be used anymore for this information, leading to constrained flexibility in the definition of mining patterns and limited execution performance in mining large logs. Enabling process mining directly on a database instead of via intermediate storage in a flat file therefore provides additional flexibility and efficiency. To help facilitate this ideal of in-database process mining, this paper formally defines a database operator that extracts the ‘directly follows’ relation from an operational database. This operator can both be used to do in-database process mining and to flexibly evaluate process mining related queries, such as: “which employee most frequently changes the ‘amount’ attribute of a case from one task to the next”. We define the operator using the well-known relational algebra that forms the formal underpinning of relational databases. We formally prove equivalence properties of the operator that are useful for query optimization and present time-complexity properties of the operator. By doing so this paper formally defines the necessary relational algebraic elements of a ‘directly follows’ operator, which are required for implementation of such an operator in a DBMS.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Data Cube Algebra Engine for Data

M.L. Kersten, A.P.J.M. Siebes CWI, Amsterdam, The Netherlands M. Holsheimer , F. Kwakkel Data Distilleries, Amsterdam, The Netherlands Abstract On line data mining products, such as Data Surveyor, illustrate that an extensible architecture to accommodate a variety of mining algorithms and database interconnectivity is technically feasible. In this paper we describe the interaction between Data ...

متن کامل

An efficient classifier design integrating rough set and set oriented database operations

Feature subset selection and dimensionality reduction of data are fundamental and most explored area of research in machine learning and data mining domains. Rough set theory (RST) constitutes a sound basis for data mining, can be used at different phases of knowledge discovery process. In the paper, by integrating the concept of RST and relational algebra operations, a new attribute reduction ...

متن کامل

An efficient approach for effectual mining of relational patterns from multi-relational database

Data mining is an extremely challenging and hopeful research topic due to its well-built application potential and the broad accessibility of the massive quantities of data in databases. Still, the rising significance of data mining in practical real world necessitates ever more complicated solutions while data includes of a huge amount of records which may be stored in various tables of a rela...

متن کامل

Towards Next Generation Business Process Model Repositories â•fi A Technical Perspective on Loading and Processing of Process Models

Business process management repositories manage large collections of process models ranging in the thousands. Additionally, they provide management functions like e.g. mining, querying, merging and variants management for process models. However, most current business process management repositories are built on top of relation database management systems (RDBMS) although this leads to performa...

متن کامل

KDD – Knowledge Discovery in Databases

2 Database Management Systems 3 2.1 Three-Schema Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Organisation of an Integrated Database System . . . . . . . . . . . . . . . . . . . . 5 2.3 Hierarchical and Network Databases . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1706.08259  شماره 

صفحات  -

تاریخ انتشار 2017